Signal-processing apparatus and electronic apparatus using same

ABSTRACT

A signal-processing apparatus comprises an instruction-parallel processor, a first data-parallel processor, a second data-parallel processor, and a motion detection unit, a de-blocking filtering unit and a variable-length coding/decoding unit which are dedicated hardware. With this structure, in signal processing of an image compression and decompression algorithm with a large processing amount, the load is distributed between software and hardware, so that the signal-processing apparatus can realize high processing capability and flexibility.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a signal-processing apparatus thatperforms audio and image compression/decompression at high speed by useof a parallel processor and dedicated hardware, and an electronicapparatus using the same.

2. Description of the Related Art

In response to the recent trend toward higher performance and downsizingof image processing apparatuses and image display apparatuses thathandle moving images, the ISO (International Organization forStandardization) and the ITU-T (International TelecommunicationUnion-Telecommunication Standardization Sector) are co-planning thestandardization of MPEG-4 AVC (Advanced Video Coding) as anext-generation compression and decompression technology. The MPEG-4 AVCrealizes a high image compression rate by introducing new technologiessuch as integer conversion of 4×4 pixels, intra prediction at up to ninedirections, seven kinds of sub-macro-block types, up to 16 motionvectors per macro-block, multi-frame reference, a de-blocking filter inthe loop and arithmetic coding, and aims at a code amount compressed to50% of the MPEG-2 that has already been put into practical use.

However, the newly introduced coding tools adopt algorithms attachingimportance to the coding efficiently; therefore, the processing amountis large and mounting to the built-in system is difficult.

For the prior signal-processing apparatus that performs compression anddecompression with the encoding method, parallel processing by theprocessor and the structure by the dedicated hardware have been used.

An example of the speed-enhanced signal processing using the parallelprocessing method by a processor is Document 1 (Japanese PatentApplication No. H03-508269). The example shown in Document 1 is aparallel processor comprising a combination of a parallel data processorof the SIMD (Single Instruction Multiple Data) type in which the numberof control streams is one and the number of data streams to be processedis more than one and a parallel data processor of the MIMD (MultipleInstruction Multiple Data) type in which the number of control streamsand the number of data streams are both more than one.

FIG. 16, which is referred from FIG. 1 of Document 1, is a block diagramillustrating a signal-processing apparatus combining a prior SIMDparallel data processor 902 and an MIMD parallel data processor 903.

The signal-processing apparatus comprises a system controller 901 thatcontrols the entire processor, the SIMD parallel data processor 902, theMIMD parallel data processor 903, a shared memory bus 904 and a sharedmemory 905.

The system controller 901 performs execution of application programs.

The SIMD parallel data processor 902 comprises an overall controller910, calculators 911 to 914 and local memories 915 to 918 respectively.One calculator and one local memory constitute one processor. Theoverall controller 910 executes the program, and issues the sameinstruction to all of the calculators 911 to 914. The calculators 911 to914 process data stored in local memories 915 to 918 respectively basedon the same issued instruction.

The MIMD parallel data processor 903 comprises an overall controller920, controllers 921 to 924, calculators 925 to 928 and local memories929 and 932. One controller, one calculator and one local memoryconstitute one processor. A different program is executed by each of thecontrollers 921 to 924, a different instruction is issued to each of thecalculators 925 to 928, and the data stored in each of the localmemories 929 to 932 is processed. The overall controller 920 performscontrol for synchronization and monitoring of the entire MIMD paralleldata processor 903.

In the parallel data processor as described above, when the objectprocessing is simple and the data processing amount is large, the SIMDparallel data processor 902 performs processing, whereas when the objectprocessing is complicated and the data processing amount is small, theMIMD parallel data processor 903 performs processing.

On the other hand, the speed enhancing method, which improves thecalculation, is used by forming the most suitable calculator for theprocessing that is objected with the dedicated hardware. As an examplethereof, Document 2 (Japanese Patent Application No. 2000-118434)discloses a technology that realizes a speedup of the processing byperforming the variable-length encoding/decoding of the image processingwith the dedicated hardware.

FIG. 17, which is referred from FIG. 1 of Document 2, is a block diagramillustrating an image processor 1001 combining the prior SIMD paralleldata processor and the dedicated hardware.

The image processor 1001 is connected to an external video input device1009, a video output device 1010 and an external memory 1011 through anexternal video data bus 1008. The image processor 1001 comprises aninstruction memory 1002, a processor 1003, SIMD calculating means 1004,VLC (Variable-Length Coding) processing means 1005, an external datainterface 1006, and an internal data bus 1007.

The VLC processing means 1005 comprises the dedicated hardware.

The processor 1003 performs scalar, bit manipulation and the issuance ofcomparison and branch instructions, and decodes the instruction held bythe instruction memory 1002, and controls the SIMD calculating means1004, the VLC processing means 1005, the external data interface 1006,the video input device 1009 and the video output device 1010.

The video input device 1009 inputs the video signals from the outside,and the video output device 1010 outputs the video data to the outside.

The image data inputted by the video input device 1009 is transferred tothe external memory 1011, and at the next step, is transferred to theexternal data interface 1006 according to the processing performed bythe SIMD calculating means 1004. The SIMD calculating means 1004performs motion compensation, DCT and quantization processing, andacquires transformed coefficient data. At the next step, in the VLCconversion means 1005, the transformed coefficient data is encoded invariable-length encoding by the VLC transforming means, and the bitstream is generated.

The SIMD calculating means 1004, which comprises eight parallel pipelinecalculators, is capable of efficiently performing routine processingsuch as DCT.

The signal-processing apparatus comprising a combination of the SIMDdata-parallel processor and the MIMD data-parallel processor is typifiedby the above-described Document 1 is flexible toward various codingalgorithms. Thus, the signal-processing apparatus can sufficientlyhandle image processing by enhancing the degree of parallelism. This isbecause the prior motion detection processing is for macro-block sizesof not less than 8×8 pels and not more than 16×16 pels.

However, according to the MPEG-4 AVC, since the smallest sub-macro-blocksize is 4×4 pels, with the prior signal-processing apparatus, theprocessing efficiency of the calculators does not improve even if 16 ormore parallel calculators are provided.

Moreover, in the arithmetic coding/decoding processing of the MPEG-4AVC, since the processing is performed while the probability ofoccurrence is changed in accordance with the contexts of peripheralmacro-blocks, it is necessary to perform coding bit by bit, which meansthe parallel processing cannot be performed. That is, with the priorsignal-processing apparatus, the processing performance in the MPEG-4AVC cannot be improved even if the degree of parallelism of the MIMDparallel data processor is enhanced.

In de-blocking filters of the MPEG-4 AVC, the filter parameter iscalculated in the unit of sub-macro-blocks of 4×4 pels, and filteringprocessing is performed based on the result. When an SIMD calculator isused, although filtering processing can be performed in parallel, thecalculators cannot be effectively used in determination processing.

Moreover, with the signal-processing apparatus comprising a combinationof the SIMD data-parallel processor and the dedicated hardware istypified by the above-described Document 2, although the processingperformance is improved by adopting the dedicated hardware for thearithmetic coding/decoding processing that requires high processingperformance, performing motion detection with the largest processingamount by the SIMD parallel data processor causes the following problem.

In the MPEG-4 AVC, motion compensation of ¼ pixel precision isintroduced, and it is necessary to perform 6-tap filtering processingfor the pixel generation of a half pel. Further, since thesub-macro-block size of 4×4 pels is introduced, up to 16 motion vectorsper macro-block can be set. The motion detection processing in whichwith the small sub-macro-block size, a search of ¼ pixel precision isperformed and up to 16 motion vectors per macro-block are calculated isdrastically increased in processing amount.

For the SIMD data-parallel processor to perform such motion detectionprocessing, it is necessary to enhance the degree of parallelism of thecalculators and set the operating frequency to a high value. Thecapability of the SIMD parallel data processor having such capability ismore than enough in the decoding processing; therefore, the entireprocessor can not be efficiently used.

Furthermore, even if it is attempted to improve the processingperformance by enhancing the degree of parallelism of the SIMD paralleldata processor, since the block size is 4×4 pels, it is impossible forthe degree of parallelism to be more than 16.

OBJECTS AND SUMMARY OF THE INVENTION

An object of the present invention is to provide a signal-processingapparatus capable of performing high-performance and high-efficiencyimage processing for image processing requiring a large data processingamount like the coding/decoding processing of the MPEG-4 AVC, and anelectronic apparatus using the same.

A first aspect of the present invention provides a signal-processingapparatus comprising: an instruction-parallel processor; a data-parallelprocessor; and a plurality of pieces of dedicated hardware, wherein theinstruction-parallel processor performs audio compression/decompressionand non-routine or less-heavy operation of imagecompression/decompression, wherein the data-parallel processor performs,of the image compression/decompression, routine or heavy operation, andwherein the plurality of pieces of dedicated hardware perform, of theimage compression/decompression, comparatively heavy processing.

According to the present structure, the signal-processing apparatus iscomposed of the instruction-parallel processor, the data-parallelprocessor and the dedicated hardware. The instruction-parallel processorperforms non-routine processing of the audio compression/decompressionand the image processing, the data-parallel processor performs theroutine processing of the image processing, and the dedicated hardwareperforms processing such as motion detection, variable-length encoding,and de-blocking filtering processing. Consequently, for signalprocessing of an image compression and decompression algorithm with alarge processing amount, the load is distributed between software andhardware, so that a signal-processing apparatus having high processingcapability and flexibility can be realized.

A second aspect of the present invention provides a signal-processingapparatus according to the first aspect of the present invention,further comprising: a first instruction bus; a first data bus; a firstshared memory; and an input and output interface, wherein each of theinstruction-parallel processor, the data-parallel processor, theplurality of pieces of dedicated hardware and the input and outputinterface comprises a local memory, the instruction-parallel processor,the data-parallel processor and the plurality of pieces of dedicatedhardware are connected to the first instruction bus, wherein aninstruction for the instruction-parallel processor to control thedata-parallel processor and the plurality of hardware is communicatedthrough the first instruction bus, and wherein the local memory of theinstruction-parallel data processor, the local memory of thedata-parallel processor, the local memories of the plurality of piecesof dedicated hardware, the first shared memory and the local memory ofthe input and output interface are connected to the first data bus, anddata transfer is performed among these memories.

According to the present structure, in addition to the characteristicsof the signal-processing apparatus in the first aspect of the presentinvention, the bus traffic is distributed by separating the instructionbus and the data bus; therefore, the processing performance can beimproved.

A third aspect of the present invention provides a signal-processingapparatus according to the second aspect of the present invention,further comprising: a second data bus; a second shared memory; and abridge unit connecting the first data bus and the second data bus,wherein the local memory of the data-parallel processor, the localmemories of the plurality of pieces of dedicated hardware, the firstshared memory and the local memory of the input and output interface areconnected to the first data bus, and data transfer is performed amongthese memories, wherein the local memory of the instruction-parallelprocessor and the second shared memory are connected to the second databus, and data transfer is performed between these memories, and whereindata transfer between the memories connected to the first data bus andthe memories connected to the second data bus is performed through thebridge unit.

According to the present structure, the local memory of the dataprocessor, the dedicated memory of the dedicated hardware and the sharedmemory are connected by the first data bus, and the local memory of theinstruction-parallel processor and the shared memory are connected bythe second data bus. With this, the data transfer in image processinghandling a large amount of data is performed mainly through the firstdata bus, so that the load can be shared with the second data bus towhich the instruction-parallel processor performing audio processing isconnected.

A fourth aspect of the present invention provides a signal-processingapparatus according to the third aspect of the present invention,further comprising a control processor, wherein the instruction-parallelprocessor controls the data-parallel processor and the plurality ofpieces of dedicated hardware through the control processor.

According to the present structure, since the instruction-parallelprocessor is capable of controlling the data-parallel processor and thededicated hardware through the control processor, the load isdistributed between the instruction-parallel processor and the controlprocessor; therefore, higher processing performance can be realized.

A fifth aspect of the present invention provides a signal-processingapparatus according to the fourth aspect of the present invention,further comprising a second instruction bus, wherein theinstruction-parallel processor, the control processor and a part of theplurality of pieces of dedicated hardware are connected to the firstinstruction bus, wherein the control processor, the data-parallelprocessor and the remainder of the plurality of pieces of dedicatedhardware, the remainder being not connected to the first instructionbus, are connected to the second instruction bus, and wherein theinstruction-parallel processor controls the part of the plurality ofpieces of dedicated hardware, and controls, through the controlprocessor, the data-parallel processor and the remainder of theplurality of pieces of hardware.

According to the present structure, since the instruction-parallelprocessor needs to control only the control processor and part of thededicated hardware through the first instruction bus, and thedata-parallel processor that performs routine processing and thededicated hardware are controlled by the control processor through thesecond instruction bus, instruction confliction of the instruction buscan be avoided; therefore, signal processing can be performed withefficiency.

A sixth aspect of the present invention provides a signal-processingapparatus according to the first aspect of the present invention,wherein the data-parallel processor comprises a plurality of processingunits, and wherein a number of the plurality of processing units of thedata-parallel processor is determined according to a compressed ordecompressed image size.

According to the present structure, since the degree of parallelism ofthe data-parallel processor is changed according to the size of theimage which is the object of compression and decompression, asignal-processing apparatus capable of handling various image sizes withthe same processor architecture can be provided.

A seventh aspect of the present invention provides a signal-processingapparatus according to the first aspect of the present invention,wherein the data-parallel processor comprises a plurality of processingunits, and wherein a number of the plurality of processing units of thedata-parallel processor is determined according to at least one of apower supply voltage and an operating frequency.

According to the present structure, the degree of parallelism of thedata-parallel processor can be changed according to the power supplyvoltage and the operating frequency shared with LSIs. The operationfrequency is reduced by increasing the degree of parallelism of the dataparallel processing and the power consumption of the signal processingcan be decreased; therefore, applications to electronic apparatuses suchas mobile terminals particularly effective.

An eighth aspect of the present invention provides a signal-processingapparatus according to the first aspect of the present invention,wherein processing performed by the plurality of pieces of dedicatedhardware includes at least one of variable-length coding processing,variable-length decoding processing, video input and output processing,motion detection processing, motion compensation processing, DCT(discrete cosine transform) processing, inverse DCT processing,quantization processing, inverse quantization processing and de-blockingfiltering processing.

According to the present structure, in the compressing/decompressingprocessing, incensement of the motion frequency for theinstruction-parallel processor and the data-parallel processor can becontrolled by processing the heavy modules of the processing amount suchas motion detection, variable-length encoding/decoding, and de-blockingfilter by the dedicated hardware.

A ninth aspect of the present invention provides a signal-processingapparatus according to the fifth aspect of the present invention,wherein processing performed by the part, of the plurality of pieces ofdedicated hardware, the part being connected to the first instructionbus, is variable-length coding processing and/or variable-lengthdecoding processing.

According to the present structure, the dedicated hardware performingvariable-length encoding and/or decoding can be directly and frequentlycontrolled by the instruction-parallel processor. Consequently, thevariable-length encoding and/or decoding can be finely and diverselycontrolled.

A tenth aspect of the present invention provides an electronic apparatuscomprising a signal-processing apparatus, the signal-processingapparatus comprising: an instruction-parallel processor; a data-parallelprocessor; and a plurality of pieces of dedicated hardware, wherein theinstruction-parallel processor performs audio compression/decompressionand non-routine or less-heavy operation of imagecompression/decompression, wherein the data-parallel processor performs,of the image compression/decompression, routine or heavy operation,wherein the plurality of pieces of dedicated hardware perform, of theimage compression/decompression, comparatively heavy processing, andwherein the signal-processing apparatus performs at least one of audiocompression processing, audio decompression processing, imagecompression processing and image decompression processing.

According to the present structure, the electronic apparatus, whichmakes full use of the characteristics of the signal-processingapparatus, can be provided.

An eleventh aspect of the present invention provides an electronicapparatus according to the tenth aspect of the present invention,further comprising: a reproducer; a demodulator/error corrector; amemory; and a plurality of D/A converters, wherein the reproducerreproduces modulated coded signals from a recording medium loadedtherein, wherein the demodulator/error corrector demodulates themodulated coded signals reproduced by the reproducer, error-corrects thedemodulated signals, and outputs the error-corrected signals as codeddata, wherein the signal-processing apparatus decodes the coded dataoutputted by the demodulator/error corrector, and outputs the decodeddata as video data and audio data, wherein the memory stores data beforedecoding, during decoding and after decoding, and wherein the pluralityof D/A converters D/A-convert the video data and the audio dataoutputted by the signal-processing apparatus, and outputs an analogvideo output and an analog audio output.

According to the present structure, coded data can be efficientlydecoded at high speed, and a reproduction electronic apparatus with alow power consumption can be realized.

A twelfth aspect of the present invention provides an electronicapparatus according to the tenth aspect of the present invention,further comprising: a plurality of A/D converters; a memory; an errorcorrector/modulator; and a recorder, wherein the plurality of A/Dconverters A/D convert an inputted analog video input and analog audioinput, and outputs video data and audio data, wherein thesignal-processing apparatus encodes the video data and the audio dataoutputted by the plurality of A/D converters, and outputs coded data,wherein the memory stores data before encoding, during encoding andafter encoding, wherein the error corrector/modulator adds an errorcorrecting code to the coded data encoded by the signal-processingapparatus, modulates the coded data, and outputs the modulated data ascoded signals, and wherein the recorder records the coded signalsoutputted by the error corrector/modulator onto a recording mediumloaded therein.

According to the present structure, the AV signals can be efficientlyencoded at high speed, and the recording electronic apparatus with a lowpower consumption can be realized.

A thirteenth aspect of the present invention provides an electronicapparatus comprising the electronic apparatus according to the eleventhaspect of the present invention and the electronic apparatus accordingto the twelfth aspect of the present invention.

According to the present structure, the AV signals can be efficientlyencoded/decoded at high speed, and the electronic apparatus with a lowpower consumption, into which a recording function and a reproductionfunction are integrated, can be realized.

The above, and other objects, features and advantages of the presentinvention will become apparent from the following description read inconjunction with the accompanying drawings, in which like referencenumerals designate the same elements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a signal-processing apparatus in a firstembodiment of the present invention;

FIG. 2 is a block diagram of a signal-processing apparatus in a secondembodiment of the present invention;

FIG. 3 is a block diagram of a video encoder in a third embodiment ofthe present invention;

FIG. 4 is a block diagram of a CABAC (Context Adaptive Binary ArithmeticCoding) arithmetic coding unit;

FIG. 5 shows the layout of a coding object block and adjacent blocks;

FIG. 6 explains how a motion compensation of ¼ pixel precision performs;

FIG. 7 is a block diagram of a de-blocking filter in the thirdembodiment of the present invention;

FIG. 8 explains how a processing sequence of the de-blocking filterperforms;

FIG. 9 shows a comparison of encoding processing amount between thethird embodiment of the present invention and a different method;

FIG. 10 is a block diagram of a video decoder in a fourth embodiment ofthe present invention;

FIG. 11 is a block diagram of an audio encoder in a fifth embodiment ofthe present invention;

FIG. 12 is a block diagram of an audio decoder in the fifth embodimentof the present invention;

FIG. 13 is a block diagram of an AV reproduction system in a sixthembodiment of the present invention;

FIG. 14 is a block diagram of an AV recording system in a seventhembodiment of the present invention;

FIG. 15 is a block diagram of an AV recording/reproduction system in aneighth embodiment of the present invention;

FIG. 16 is a block diagram of the prior signal-processing apparatuscomprising a combination of the SIMD parallel data processor and theMIMD parallel data processor; and

FIG. 17 is a block diagram of the prior image processor comprising thecombination of the SIMD parallel data processor and the dedicatedhardware.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Hereinafter, embodiments of the present invention are described withreference to the accompanying drawings.

(First Embodiment)

FIG. 1 is the block diagram of the signal-processing apparatus in thefirst embodiment of the present invention. The signal-processingapparatus of the present embodiment comprises: an instruction-parallelprocessor 100 having a local memory 110; a first data-parallel processor101 having a local memory 111; a second data-parallel processor 102having a local memory 112; a motion detection unit 103 having a localmemory 113; a de-blocking filtering unit 104 having a local memory 114;a variable-length coding/decoding unit 105 having a local memory 115; aninput and output interface 106 having a local memory 116; a first sharedmemory 121; a first instruction bus 130; and a first data bus 132. Theprocessors 100 to 102 and the units 112 to 116 are connected to thefirst instruction bus 130, and the local memories 110 to 116, the firstshared memory 121 and the input and output interface 106 are connectedto the first data bus 132. The variable-length coding/decoding unit 105further has a bit stream input and output 135 for external apparatuses,and the input and output interface 106 has an audio input and output 136and a video input and output 137 for external apparatuses.

The SIMD processor adopted as the first data-parallel processor 101 andthe second data-parallel processor 102 include eight processingelements, and are capable of processing eight data streams in parallelat one instruction.

The motion detection unit 103, the de-blocking filtering unit 104, thevariable-length coding/decoding unit 105 and the input and outputinterface 106 are each dedicated hardware.

Next, the operation of the present embodiment will be described inoutline with image coding processing as an example.

After an externally inputted video signal is A/D converted, the signalis stored into the first shared memory 121 from the input and outputinterface 106 via the first data bus 132.

The motion detection unit 103 calculates the motion vector based on theimage data of the previous frame stored in the first shared memory 121and the image data of the present frame.

The first data-parallel processor 101 calculates the prediction imagedata by performing motion compensation processing based on the imagedata of the previous frame stored in the first shared memory 121 and themotion vector calculated by the motion detection unit 103. Moreover,difference image data of the image data of the current frame withrespect to the predicted image data is calculated.

The second data-parallel processor 102 discrete-cosine-transforms thedifference image data, and quantizes the obtained DCT coefficient.Moreover, the second data-parallel processor 102 inversely quantizes thequantized DCT coefficient, inversely discrete-cosine-transforms it,calculates the difference image data, and calculates reconstructed imagedata from the difference image data and the predicted image dataprocessed by the first data-parallel processor 101.

In the signal-processing apparatus of the present embodiment, while thefirst data-parallel processor 101 is performing the calculation of thepixel value of the motion compensation processing, the seconddata-parallel processor 102 performs the DCT processing. As describedabove, it is possible to cause two different data-parallel processors toperform different processing while maintaining the operating ratiosthereof, whereby the performance is improved.

The de-blocking filtering unit 104 performs de-blocking filteringprocessing to the reconstructing image data, removes block noise, andstores it into the first shared memory 121.

The variable-length coding/decoding unit 105 performs variable-lengthcoding processing using an arithmetic code on the quantized DCTcoefficient and the motion vector, and outputs the coded data as a bitstream.

The instruction-parallel processor 100 performs the overall control ofthe above-described various processing through the first instruction bus130. Moreover, the instruction-parallel processor 100 performs a codingmode determination as to whether to perform the generation of thepredicted image by intra prediction coding or by inter predictioncoding.

The data transfer between the processors and the units is performedthrough the first data bus 132.

High-efficiency image processing can be realized by performingsequential processing of the image compression/decompression by theinstruction-parallel processor 100, performing routine processing of theimage compression/decompression by the first data-parallel processor 101and the second data-parallel processor 102, and performing heavyprocessing such as the motion detection processing, the de-blockingfiltering processing and the variable-length coding processing by thededicated hardware as described above.

The demarcation in sharing the object of processing between the firstdata-parallel processor 101 and the second data-parallel processor 102in the present embodiment is an example, and it may be different. Inother word, according to the performance of the processor, theprocessing of the first data-parallel processor 101 and the seconddata-parallel processor 102 may be performed by one data-parallelprocessor.

Further, the motion compensation processing performed by the firstdata-parallel processor 101 may be performed by the motion detectionunit 103.

(Second Embodiment)

FIG. 2 is the block diagram of the signal-processing apparatus in thesecond embodiment of the present invention. In FIG. 2, componentssimilar to those of FIG. 1 are denoted by the same reference numerals,and descriptions thereof are omitted.

The signal-processing apparatus of the present embodiment furthercomprises, compared to the signal-processing apparatus of the firstembodiment, a control processor 107, a second shared memory 122, asecond instruction bus 131, a second data bus 133, and a bridge unit 120connecting the first data bus 132 and the second data bus 133.

The instruction-parallel processor 100, the control processor 107 andthe variable-length coding/decoding unit 105 are connected to the firstinstruction bus 130.

The control processor 107, the first data-parallel processor 101, thesecond data-parallel processor 102, the motion detection unit 103 andthe de-blocking filtering unit 104 are connected to the secondinstruction bus 131.

The local memories 111 to 115, the first shared memory 121, the inputand output interface 106 and the bridge unit 120 are connected to thefirst data bus. The local memory 110, the second shared memory 122 andthe bridge unit 120 are connected to the second data bus.

In the signal-processing apparatus of the present embodiment, theparallel processing of data is enhanced compared to that of the firstembodiment. In other word, the control processor 107 introduced in thepresent embodiment controls, in response to an instruction from theinstruction-parallel processor 100, the first data-parallel processor101, the second data-parallel processor 102, the motion detection unit103 and the de-blocking filtering unit 104 through the secondinstruction bus 131. Consequently, the signal-processing apparatus ofthe present embodiment is capable of more rapidly performing parallelprocessing by the data-parallel processors and the dedicated hardware.

Further, the second shared memory 122 of the present embodiment storesdata related to the instruction-parallel processor 100, and dataaccessed at a comparatively low frequency among the data handled by thecomponents connected to the first data bus 132. The structure reducesthe load on the first shared memory 121, so that the processingefficiency of the entire signal-processing apparatus is improved.

The operation of the present embodiment will be described in detail inthe third embodiment described below.

(Third Embodiment)

FIG. 3 is the block diagram showing a video encoder in the thirdembodiment of the present invention.

The video encoder of the present embodiment is an encoder capable of theMPEG-4 AVC. Each component is given a name adequately expressing afunction of the video encoder corresponding to the MPEG-4 AVC.

The video encoder of the present embodiment shown in FIG. 3 comprisesthe signal-processing apparatus of the second embodiment. Therefore, thecorrespondence between the components of FIG. 3 and the components ofFIG. 2 will be shown first.

The processing of a coding controller 301 and a mode switcher 303 areperformed by the instruction-parallel processor 100 of FIG. 2.

The processing of a motion compensator 312 and a difference detector 302are performed by the first data-parallel processor 101 of FIG. 2.

The processing of a 4×4 DCT transformer 304, a quantizer 305, an inversequantizer 306, an inverse 4×4 DCT transformer 307 and a reconstructor309 are performed by the second data-parallel processor 102 of FIG. 2.

A variable-length coder 308 corresponds to the variable-lengthcoding/decoding unit 105 of FIG. 2. A de-blocking filter 310 correspondsto the de-blocking filtering unit 104 of FIG. 2. A frame memory 311corresponds to the first shared memory 121 of FIG. 2, and a motiondetector 313 corresponds to the motion detection unit 103 of FIG. 2.

Next, primary signal processing of the MPEG-4 AVC will be described withreference to the operation of the components of the present embodiment.

First, encoding processing will be described with reference to FIG. 3. Avideo input 314 is, in the case of intra coding,discrete-cosine-transformed (orthogonally transformed) by the 4×4 DCTtransformer 304 to obtain the DCT coefficient. Then, the DCT coefficientis quantized by the quantizer 305.

According to existing coding standards such as the MPEG-2 and the H.263,a real-precision DCT is adopted for the 8×8 block size, and a mismatchoccurs unless the DCT precision is defined. However, according to theMPEG-4 AVC, an integral-precision DCT is applied for the 4×4 block size,and consequently, a mismatch due to the DCT precision does not occur.

The quantized DCT coefficient is entropy-coded by use of an arithmeticcoder at the variable-length coder 308. Details thereof will bedescribed later.

Next, variable-length coding/decoding processing will be described.

The outline of the MPEG-4 AVC is described in Document 3 “The overviewof MPEG-4 AVCIH.264 and its standardization” (Teruaki SUZUKI,Information Processing Society of Japan, Audio Visual and MultimediaInformation Processing 38-13, pp.69-73, November, 2002). Descriptionwill be given based on Document 3.

In the variable-length coding of syntax elements such as the number ofmacro-blocks, the motion vector difference and the conversion factor,the following two entropy coding methods are selectively used: CAVLC(Context Adaptive Variable Length Coding); and CABAC (Context AdaptiveBinary Arithmetic Coding).

In this description, an arithmetic coding method called CABAC used inthe main profile will be explained. In the arithmetic coding, a linesegment with a length of “1” is divided according to the probability ofoccurrence of the symbol to be coded, and since the divided line segmentand the symbol to be coded correspond one to one to each other, codingis performed with respect to the line segment. Since the binary numberrepresentative of the line segment is a code, the segment of the line islarge, that is, the higher the probability of occurrence of the symbolto be coded is, the shorter the binary number the symbol can beexpressed by and consequently, the compression rate is increased.Therefore, when coding of the object block is performed, the probabilityof occurrence is manipulated in accordance with the context of theperipheral block so that the compression rate is increased.

FIG. 4 is a block diagram of a CABAC arithmetic coding unit. This isreferred from FIG. 7 of Document 3. The CABAC arithmetic coding unitshown in FIG. 4 has a context modeler 401, a binarizer 402 and anadaptive binary arithmetic coding processor 405. The adaptive binaryarithmetic coding processor 405 has an occurrence probability predictor403 and a coder 404.

The context modeling is a probability model when each symbol is coded. Acontext is defined for each syntax element, and arithmetic coding isperformed by switching the probability table in accordance with thecontext.

FIG. 5 shows the layout of the coding object block and adjacent blocks.In FIG. 5, when a coding object block C408 is coded, the context of thecoding object block C408 is determined in accordance with the conditionof adjacent blocks A406 and B407.

In the above-described arithmetic coding processing, the decodingprocessing of the variable-length-coded code is a sequential processingof analyzing occurrence probability information by a decoder andperforming reconstruction based on the information. Moreover, since themanipulation of the probability of occurrence is performed by use of atable, performing these coding processing and decoding processing by useof a VLIW (Very Long Instruction Word)-compliant instruction-parallelprocessor (in the above-described second embodiment, corresponding tothe instruction-parallel processor 100 shown in FIG. 2) or an SIMDdata-parallel processor (similarly, corresponding to the firstdata-parallel processor 101 or the second data-parallel processor 102)does not improve the processing performance. Rather, by performing theseprocessing by use of dedicated hardware (similarly, corresponding to thevariable-length coding/decoding unit 105), the load can be distributedbetween the instruction-parallel processor and the data-parallelprocessor. Consequently, the operating frequency is reduced, so that thefrequency balance of the processor can be made excellent. This is whythe variable-length coder 308 shown in FIG. 3 is processed by thevariable-length coding/decoding unit 105 shown in FIG. 2, which isdedicated hardware in the present embodiment.

In FIG. 3, the DCT coefficient quantized by the quantizer 305 isinversely quantized by the inverse quantizer 306 and is then inverselydiscrete-cosine-transformed by the inverse 4×4 DCT transformer 307, andthe image is reconstructed by the reconstructor 309. On thereconstructed image, de-blocking filtering processing is performed bythe de-blocking filter 310, and the pixel value is rewritten at the 4×4pixel boundary. The de-blocking filtering processing will be describedlater.

Next, the motion compensation processing of ¼ pixel precision performedby the motion compensator 312 of FIG. 3 will be described with referenceto FIG. 6. FIG. 6 is an explanatory view for explaining the motioncompensation of {fraction (1/4)} pixel precision.

Motion compensation is to construct a predicted image closer to theimage to be coded, by use of information on the motion vector when apredicted image is constructed from an image referred to. Since the codeamount decreases as the prediction error decreases, the MPEG-4 AVCadopts the motion compensation of ¼ pixel precision. The motion vectorcomprises two parameters representative of a translational movement inthe unit of blocks (the distance moved in the horizontal direction andthe distance moved in the vertical direction).

The predicted image of the reference image pointed by the motion vectoris obtained by the following manner:

In FIG. 6, pixels A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S,T and U are pixels in integral positions, pixels aa, bb, cc, dd, ee, ff,gg and hh and pixels b, h, j, m and s are pixels of {fraction (1/2)}precision, and pixels a, c, d, e, f, g, i, k, n, p, q and r are pixelsof ¼ precision.

The procedure of obtaining the values of these pixels is now described.First, the pixel b of {fraction (1/2)} precision is obtained in thefollowing manner: With the pixels E, F, G, H, I and J in the vicinity ofthe pixel b in the horizontal direction as variables, intermediate datab1 is generated by use of a 6-tap filter defined by (Equation 1).b1=(E−5*F+20*G+20*H−5*I+J)  [Equation 1]

Then, the intermediate data bl is rounded and normalized by (Equation 2)and clipped to 0 to 255, whereby the pixel b is obtained.b=Clip((b1+16)/32)  [Equation 2]Here, Clip(X) is a function that clips the variable X inside theparentheses to a range of 0 to 255. That is, when the variable X is lessthan 0, b=0, when the variable X is in the range of 0 to 255, b=X, andwhen the variable X is not less than 256, b=255.

Likewise, the pixel h of ½ precision is obtained in the followingmanner: With the pixels A, C, G, M, R and T in the vicinity of the pixelh in the vertical direction as variables, intermediate data h1 isgenerated by use of a 6-tap filter defined by (Equation 3).h1=(A−5*C+20*G+20*M−5*R+T)  [Equation 3]

The intermediate data 11 is rounded and normalized by (Equation 4) andclipped to 0 to 255, whereby the pixel h is obtained.h=Clip((h1+16)/32)  [Equation 4]

The pixels a, c, d, f, i, k, n and q of ¼ precision are each obtained bya rounded average by use of two neighboring pixels as shown in (Equation5).a=(G+b+i)/2c=(H+b+1)/2d=(G+h+i)/2f=(b+j+1)/2i=(h+j+1)/2k=(j+m+1)/2n=(M+h+1)/2q=+s+1)/2  [Equation 5]

Likewise, the pixels e, g, p and r of ¼ precision are each obtained by arounded average by use of two neighboring pixels as shown in (Equation6).e=(b+h+1)/2g=(B+m+1)/2p=(h+s+1)/2r=(m+s+1)/2  [Equation 6]

In the predicted image generation as described above, the motion vectorcan be set for each sub-macro-block. In the case of 4×4 where thesub-macro-blocks are smallest, it is necessary to interpolate pixels in16 real positions from the pixels in the integral positions by use of a6-tap filter. In the pixel interpolation, since there is no datadependence among pixels, processing can be performed in parallel.Therefore, by using the SIMD data-parallel processor as shown in thepresent embodiment, filtering processing can be efficiently performed.

Next, the de-blocking filtering will be described.

According to the MPEG-4 AVC, since the DCT processing is performed inthe unit of 4×4 pixels, block distortion occurs at the pixel boundary.The de-blocking filtering processing smoothes the distortion byperforming filtering on the block boundary. The filtering processingperformed on the 4×4 boundaries of the image is an adaptive filteringprocessing in which the filter strength is adjusted to a value mostsuitable for each block boundary in accordance with the value of theBoundary Strength (BS). That is, the boundary strength BS is used fordetermining whether to perform filtering on the boundary or not anddefining the maximum value of pixel value variations when filtering isperformed.

FIG. 7 is a block diagram of the de-blocking filter 310 according to thethird embodiment of the present invention. The de-blocking filter 310 ofthe present embodiment comprises a BS condition determination processor602, a memory 603, a controller 604 and a filtering processor 605. Thefiltering processor 605 comprises a memory 606 and filters 607 to 609.

In the de-blocking filter 310 shown in FIG. 7, the BS conditiondetermination processor 602 calculates the boundary strength BS,determines the result, and passes a control parameter 613 to thefiltering processor 605. The filtering processor 605 performs filteringprocessing in accordance with the control parameter 613.

The processing of the de-blocking filter 310 is now described withreference to FIG. 8.

FIG. 8 shows the processing sequence of the de-blocking filter 310according to the third embodiment of the present invention. As thefiltering processing, as shown in FIG. 8, horizontal filteringprocessing for boundaries [1] to [4] is performed, and then, verticalfiltering processing for boundaries [5] to [8] is performed.

Filtering processing when the boundary strength BS=4 will be described.In the first filtering processing on the boundary [1] of a 4×4sub-macro-block, with eight pixels p3, p2, p1, p0, q0, q1, q2 and q3sandwiching the boundary [1] as the inputs, six pixels p2, p1, p0, q0,q1 and q2 are rewritten to pixels P2, P1, P0, Q0, Q1 and Q2.

The pixels P2, P1 and P0 are switched of the filtering equation by thecondition of (Equation 7), and are calculated by (Equation 8) and(Equation 9).ap<βand |p0−q0|<4α+2ap=|p2−p0|  [Equation 7]α: coefficient 1 calculated from quantization parameterβ: coefficient 2 calculated from quantization parameter

When the condition of (Equation 7) is satisfied, the pixels P0, P1 andP2 are obtained by (Equation 8).P0=(p2+2*p1+2*p0+2*q0+q1+4)/8P1=(p2+p1+p0+q0+2)/4P2=(2*p3+3*p2+p1+p0+q0+4)/8  [Equation 8]

When the condition of (Equation 7) is not satisfied, the pixels P0, P1and P2 are obtained by (Equation 9).P0=(2*p1+p0+q1+2)/4P1=p1P2=p2  [Equation 9]

The pixels Q0, Q1 and Q2 are switched of the filtering equation by thecondition of (Equation 10), and are calculated by (Equation 11) and(Equation 12).ap<β and |p0−q0|<4α+2aq=|q2−q0|  [Equation 10]α: coefficient 1 calculated from quantization parameterβ: coefficient 2 calculated from quantization parameter

When the condition of (Equation 10) is satisfied, the pixels Q0, Q1 andQ2 are calculated by (Equation 11).Q0=(p1+2*p0+2*q0+2*q1+q2+4)/8Q1=(p0+q0+q1+q2+2)/4Q2=(2*q3+3*q2+q1+q0+p0+4)/8  [Equation 11]

When the condition of (Equation 10) is not satisfied, the pixels Q0, Q1and Q2 are calculated by (Equation 12).Q0=(2*q1+q0+p1+2)/4Q1=q1Q2=q2  [Equation 12]

When the filtering processing is adaptively switched according to thequantization parameter and the pixel value as described above, with thedata processor by the SIMD data-parallel processor, the BS conditiondetermination cannot be performed in parallel, so that the calculatorsdisposed in parallel cannot be effectively used. Instead, by performingthe de-blocking filtering processing by dedicated hardware comprisingthe BS condition determination processor 602 and the filtering processor605 as shown in FIG. 7, the BS calculation processing and the filteringprocessing can be separately performed and this speeds up the BScondition determination processing so that the filtering processing canbe performed in parallel. Consequently, the de-blocking filteringprocessing can be efficiently performed. Further, since the brightness Yand the color difference UV are not dependent on data, the filterprocessor is capable of parallel operation, and the introduction ofcalculators can further reduce the number of processing cycles. This iswhy the de-blocking filter 310 shown in FIG. 3 is processed by thede-blocking filtering unit 104 shown in FIG. 2 which is dedicatedhardware in the present embodiment.

The image having undergone the de-blocking filtering processing by thede-blocking filter 310 in the video encoder of the present embodimentshown in FIG. 3 is stored in the frame memory 311 because it is not onlyused as the output image but also is referred to as the reference imagefor the frame and succeeding frames.

Next, the processing amount required when the video encoder shown inFIG. 3 is implemented by the signal-processing apparatus of the presentembodiment is compared with the processing amount required when it isimplemented by a different method.

FIG. 9 is a bar chart showing the comparison of the encoding processingamount between the third embodiment of the present invention and adifferent method.

In FIG. 9, a method 1 is a case where the video encoder shown in FIG. 3is structured by use of a processor capable of issuing one instructionper clock cycle and all the processing is performed via software. Amethod 2 is a case where the video encoder shown in FIG. 3 is structuredby a combination of an MIMD parallel data processor and an SIMD paralleldata processor and all the processing is performed via software. Amethod 3 is a case where the video encoder shown in FIG. 3 is structuredby use of an SIMD parallel data processor and dedicated VLC hardware. Amethod 4 is a case where the video encoder shown in FIG. 3 is structuredby a VLIW parallel data processor, an SIMD parallel data processor anddedicated hardware, and corresponds to the present embodiment. That is,the VLIW parallel data processor of the method 4 corresponds to theinstruction-parallel processor 100 of the present embodiment shown inFIG. 2, the SIMD parallel data processor of the method 4 corresponds tothe first data-parallel processor 101 and the second data-parallelprocessor 102 of the present embodiment shown in FIG. 2, and thededicated hardware of the method 4 corresponds to the motion detectionunit 103, the de-blocking filtering unit 104 and the variable-lengthcoding/decoding unit 105 of the present embodiment shown in FIG. 2.

In the encoding processing, the processing amount in the motiondetection, the motion compensation, the variable-length coding and thede-blocking filtering is large.

Comparing concrete numerical values of these processing amounts amongthe methods are as follows: In the method 1, the motion detectionprocessing is “3048” megacycles, the variable-length coding processingis “1000” megacycles, the de-blocking filtering processing is “321”megacycles, the motion compensation processing is “314” megacycles, andthe remaining processing is “217” megacycles. The total processingamount is “4900” megacycles.

In the method 2, the motion detection processing is “381” megacycles,the variable-length coding processing is “333” megacycles, thede-blocking filtering processing is “107” megacycles, the motioncompensation processing is “39” megacycles, and the remaining processingis “52” megacycles. The total processing amount is “900” megacycles.

In the method 3, the motion detection processing is “381” megacycles,the variable-length coding processing is “67” megacycles, thede-blocking filtering processing is “80” megacycles, the motioncompensation processing is “39” megacycles, and the remaining processingis “30” megacycles. The total processing amount is “607” megacycles.

In the method 4, the motion detection processing is “203” megacycles,the variable-length coding processing is “67” megacycles, thede-blocking filtering processing is “21” megacycles, the motioncompensation processing is “21” megacycles, and the remaining processingis “29” megacycles. The total processing amount is “352” megacycles.

The motion detection processing is a processing of selecting a position(motion vector) where the sum of the absolute values of the differencesbetween the pixel values of the object macro-block and the referencemacro-block is the smallest. In the case of the MPEG-4 AVC, the motionvector can be set in the unit of 4×4 sub-macro-blocks.

Therefore, the calculation of the sum of the absolute values of thedifferences among 16 pixels can be processed in parallel. In the methods2 and 3, the motion detection processing is performed by an 8-parallelSIMD parallel data processor, and compared to the method 1, asignificant speedup is realized. In the method 4, since the motiondetection processing is performed by 16-parallel dedicated hardwarecapable of calculating the sum of the absolute values of thedifferences, higher-speed processing than the SIMD parallel dataprocessor is realized.

The motion compensation processing is a processing of obtaining thereference image pointed by the motion vector, with ¼ pixel precision. Inthis processing, parallel processing is also possible because processingis performed in the unit of 4×4 sub-macro-blocks. Like in the case ofthe motion detection processing, the motion compensation processing isperformed by the 8-parallel SIMD parallel data processor in the methods2 and 3 and by the dedicated hardware in the method 4, thereby asignificant speedup is realized.

The variable-length coding processing which is an arithmetic codingprocessing called CABAC is a sequential processing of performingdecoding by changing the probability of occurrence of the object blockin accordance with the context of the peripheral block. The method 2 isintended to perform the variable-length coding processing by using anMIMD parallel data processor capable of issuing four instructions, andthe processing amount is, at most, ⅓ that of the one instruction issuingprocessor of the method 1. In the methods 3 and 4, the VLC processing isperformed by dedicated hardware, and since the determination processingand the table search processing are performed at high speed, theprocessing time can be reduced to {fraction (1/15)} that of the method1.

The de-blocking filtering processing is a parallel processing by theMIMD parallel data processor in the method 2, and a parallel processingby the SIMD parallel data processor in the method 3. Since theperformance of the filtering processing and the performance of the BSdetermination processing are not improved in the MIMD type and the SIMDtype, respectively, the processing time can be reduced only to ⅓ to{fraction (1/4)}. On the other hand, in the method 4, the de-blockingfiltering processing is performed by the dedicated hardware, and bydividing the BS determination processing, and the filtering processing,and by performing of a pipeline operation, the processing time can bereduced to {fraction (1/15)} that of the method 1.

As is apparent from the above, by constituting the motion detectionprocessing, the motion compensation processing, the variable-lengthcoding processing and the de-blocking filtering by dedicated hardwarelike in the present embodiment, a significant speedup is realized.

(Fourth Embodiment)

FIG. 10 is a block diagram of a video decoder according to a fourthembodiment of the present invention.

The video decoder of the present embodiment is a decoder capable of theMPEG-4 AVC. Each component is given a name adequately expressing afunction of the video decoder according to the MPEG-4 AVC.

The video decoder of the present embodiment shown in FIG. 10 comprisesthe signal-processing apparatus of the second embodiment. Thecorrespondence between the components of FIG. 10 and the components ofFIG. 2 is now shown.

The processing of a decoding controller 331 is performed by theinstruction-parallel processor 100 of FIG. 2.

The processing of a motion vector decoder 336 and a motion compensator337 are performed by the first data-parallel processor 101 of FIG. 2.

The processing of an inverse quantizer 333, an inverse 4×4 DCTtransformer 334 and a reconstructor 335 are performed by the seconddata-parallel processor 102 of FIG. 2.

A variable-length decoder 332 corresponds to the variable-lengthencoding/decoding unit 105 of FIG. 2, a de-blocking filtering 338corresponds to the de-blocking filtering unit 104 of FIG. 2, and a framememory 339 corresponds to the first shared memory 121 of FIG. 2.

The outline of the operation of the video decoder of the presentembodiment is now described.

A encoding video input 341 encoded by arithmetic encoding is inputted tothe variable-length decoder 332 and decoded to obtain the quantized DCTcoefficient and the motion vector difference. The obtained quantized DCTcoefficient is inversely quantized by the inverse quantizer 333, andthen, inversely discrete-cosine-transformed by the inverse 4×4 DCTtransformer 334 to obtain the difference image data.

On the other hand, the motion vector is obtained by the motion vectordecoder 336 from the motion vector difference obtained by thevariable-length decoder 332, and the predicted image is obtained by themotion compensator 337 from the reference image and the motion vectorstored in the frame memory 339.

A new image is reconstructed by the reconstructor 335 from thedifference image data and the predicted image and outputted as a videooutput 342. The outputted video output 342 is, at the same time,de-blocking-filtering-processed by the de-blocking filter 338, and then,stored into the frame memory 339.

The control of the quantizer 333 and the inverse 4×4 DCT transformer 334is performed by the decoding controller 331.

The de-blocking filtering processing, the inverse quantizationprocessing and the inverse DCT processing are similar to those in thethird embodiment, and descriptions thereof are omitted.

In the present embodiment, by performing the variable-length decodingprocessing and the de-blocking filtering processing by dedicatedhardware, a significant speedup can be realized.

Moreover, while the above description takes up an example in which thevideo decoder of the present embodiment is implemented by use of thesignal-processing apparatus of the second embodiment of the presentinvention shown in FIG. 2, the video decoder of the present embodimentcan be realized by use of the signal-processing apparatus of the firstembodiment of the present invention. Moreover, the processing objectthat each processor takes charge of can be changed as required.

(Fifth Embodiment)

FIG. 11 is a block diagram of an audio encoder according to a fifthembodiment of the present invention. FIG. 12 is a block diagram of anaudio decoder according to the fifth embodiment of the presentinvention.

In the audio encoder shown in FIG. 11, an audio input 353 undergoescompression processing including sampling and quantization at acompressor 351, undergoes encoding at a encoder 352, and is outputted asan encoded audio output 354.

In the audio decoder shown in FIG. 12, an encoded audio input 363 isdecoded by a decoder 361, and is inversely quantized by a decompressor362 to be decompressed.

Audio encoding and decoding can be processed by any processor becausethe required processing amount is small compared to that of videoencoding and decoding according to the MPEG-4 AVC.

When the audio encoder and the audio decoder of the present embodimentare implemented by use of the signal-processing apparatus of the firstembodiment, the processing of the compressor 351 and the coder 352 shownin FIG. 11 and the processing of the decoder 361 and the decompressor362 shown in FIG. 12 are performed by the instruction-parallel processor100 shown in FIG. 1. These processing can be performed with a sufficientmargin.

(Sixth Embodiment)

FIG. 13 is a block diagram of an AV reproduction system according to asixth embodiment of the present invention.

The AV reproduction system of the present embodiment has a reproducer801, a demodulator/error corrector 802, an AV decoder 803, a memory 804,and D/A converters 805 and 807. The AV decoder 803 has a video decoder803A and an audio decoder 803B.

The video decoder 803A is the video decoder of the fourth embodiment ofthe present invention shown in FIG. 10, and can be implemented by use ofthe signal-processing apparatus of the first embodiment of the presentinvention or the signal-processing apparatus of the second embodiment.

The audio decoder 803B is an audio decoder of the fifth embodiment ofthe present invention shown in FIG. 12. As mentioned in the fifthembodiment, in the processing of the audio decoder of the fifthembodiment, since the required processing amount is small compared tothat of the image data processing, parallel processing can be performedby the signal-processing apparatus of the first embodiment applied tothe video decoder 803A or the instruction-parallel processor 100 (FIG. 1or FIG. 2) of the signal-processing apparatus of the second embodiment,and it is unnecessary to provide a different processor. Therefore, theAV decoder 803 can be structured by one signal-processing apparatus ofthe first embodiment or one signal-processing apparatus of the secondembodiment.

The reproducer 801 reproduces media on which coded AV signals arerecorded, and outputs reproduction signals. The reproducer 801 may beany reproducer that is capable of reproducing media on which coded AVsignals according to the MPEG-4 AVC standard are recorded such as a DVDvideo reproducer or an HD (hard disk) video reproducer.

The demodulator/error corrector 802 demodulates the signal reproduced bythe reproducer 801, error-corrects the demodulated signal, and outputsthe error-corrected signal to the AV decoder 803.

The video decoder 803A of the AV decoder 803 decodes the coded videosignal and outputs the decoded signal, and the outputted signal isconverted to an analog signal by the D/A converter 805 and outputted asa video output 806.

The audio decoder 803B of the AV decoder 803 decodes the coded audiosignal and outputs the decoded signal, and the outputted signal isconverted to an analog signal by the D/A converter 807 and outputted asan audio output 808.

In the memory 804, AV signals before decoding, during decoding and/orafter decoding, and other data are stored.

In the AV reproduction system of the present embodiment, part or all ofthe functions of the demodulator/error corrector 802 may be provided tothe reproducer 801.

The AV reproduction system of the present embodiment can be used forreceiving MPEG-4 AVC-compliant AV signals transmitted from CATV, theInternet or satellite communications, and can be also used fordemodulating and decoding them. In this case, the AV reproduction systemcan be performed to input the received signal to the demodulator/errorcorrector 802 and decode the signal by the above-described process.Further, the AV reproduction system of the present embodiment can beapplied as a digital television by displaying the video output on adisplay.

(Seventh Embodiment)

FIG. 14 is a block diagram of an AV recording system according to aseventh embodiment of the present invention.

The AV recording system of the present embodiment has an AV encoder 825,an error correcting code adder/modulator 827, a recorder 828, a memory826 and A/D converters 822 and 824. The AV encoder 825 has a videoencoder 825A and an audio encoder 825B.

The video encoder 825A is the video encoder of the third embodiment ofthe present invention shown in FIG. 3, and can be implemented by use ofthe signal-processing apparatus of the first embodiment of the presentinvention or the signal-processing apparatus of the second embodiment.

The audio encoder 825B is an audio encoder of the fifth embodiment ofthe present invention shown in FIG. 11. As mentioned in the fifthembodiment, in the processing of the audio encoder of the fifthembodiment, since the required processing amount is small compared tothat of the image data processing, parallel processing can be performedby the instruction-parallel processor 100 (FIG. 1 or FIG. 2) of thesignal-processing apparatus of the first embodiment or thesignal-processing apparatus of the second embodiment applied to thevideo encoder 825A, and it is unnecessary to provide a differentprocessor. Therefore, the AV encoder 825 can be structured by onesignal-processing apparatus of the first embodiment or onesignal-processing apparatus of the second embodiment.

The outline of the operation of the AV recording system of the presentembodiment is now described.

A video input 821 is A/D converted by the A/D converter 822, an audioinput 823 is A/D converted by the A/D converter 824, and these areoutputted to the A/V encoder 825.

The video encoder 825A of the AV encoder 825 encodes the inputted videosignal according to the MPEG-4 AVC specifications, and outputs thesignal as an encoded video bit stream. Likewise, the audio encoder 825Bencodes the inputted audio signal according to the MPEG-4 AVCspecifications, and outputs the signal as an encoded audio bit stream.

The error corrector/modulator 827 adds an error correcting code to theencoded video bit stream and the encoded audio bit stream outputted bythe AV encoder 825, modulates the bit streams, and outputs them to therecorder.

The recorder 828 records the modulated AV signal onto a recordingmedium. The recording medium includes an optical medium such as a DVD, amagnetic recording medium such as an HD (hard disk) or a semiconductormemory.

In the memory 826, AV signals before encoding, during encoding and/orafter encoding by the AV encoder 825, and other data are stored.

In the AV recording system of the present embodiment, part or all of thefunctions of the error corrector/modulator 827 may be included in therecorder 828.

The AV recording system of the present embodiment can be used as a videocamera system in which a video camera is connected to an input and thesignal therefrom is encoded and recorded according to the MPEG-4 AVCspecifications.

(Eighth Embodiment)

FIG. 15 is a block diagram of an AV recording/reproduction systemaccording to an eighth embodiment of the present invention. The AVrecording/reproduction system of the present embodiment has a controller840, a recorder/reproducer 841, a modem/error processor 842, an AVencoder/decoder 843, an AV interface 845 and a memory 844. The AVencoder/decoder 843 has a video encoder/decoder 843A and an audioencoder/decoder 843B. The AV interface 845 has video input and output,and audio input and output.

As for the function, the AV encoder/decoder 843 has functions equal tothose of the video encoder of the third embodiment of the presentinvention, the video decoder of the fourth embodiment and the audioencoder and the audio decoder of the fifth embodiment, and is structuredby one signal-processing apparatus of the first embodiment or onesignal-processing apparatus of the second embodiment. Descriptions ofthe operation thereof are omitted in this embodiment because they havealready been given.

The recorder/reproducer 841 records/reproduces modulated AV signalsaccording to the MPEG-4 AVC specifications. The recording mediumincludes an optical medium such as a DVD, a magnetic recording mediumsuch as an HD (hard disk) or a semiconductor memory. Therecorder/reproducer 841 has a different recording/reproduction mechanismaccording to the recording medium being used.

The modem/error processor 842, at the time of recording, adds an errorcorrecting code to the video bit stream and the audio bit stream encodedby the AV encoder/decoder 843, modulates the bit streams, and transmitsthem to the recorder/reproducer 841. The modem/error processor 842, atthe time of reproduction, demodulates the AV signal reproduced by therecorder/reproducer 841, error-corrects the demodulated signal, andthen, transmits the video bit stream and the audio bit stream to the AVencoder/decoder 843.

The AV interface 845, at the time of reproduction, D/A converts thevideo signal and the audio signal decoded by the AV encoder/decoder 843,and outputs a video output 846 and an audio output 848. The AV interface845, at the time of recording, A/D converts a video input 847 and anaudio input 849, and transmits them to the AV encoder/decoder 843.

In the memory 844, AV signals before encoding, during encoding and/orafter encoding and AV signals before decoding, during decoding and/orafter decoding by the AV encoder/decoder 843, and other data are stored.

The controller 840 controls the recorder/reproducer 841, the modem/errorprocessor 842, the AV encoder/decoder 843 and the AV interface 845 toswitch the functions thereof between at the time of recording and at thetime of reproduction, and controls data transfer.

In the AV recording/reproduction system of the present embodiment, partor all of the functions of the modem/error processor 842 may be includedin the recorder/reproducer 841.

As described above in detail, the signal-processing apparatus of thepresent invention and an electronic apparatus using the same areexpected to be applied to various electronic apparatuses to which theMPEG-4 AVC encoding standard is applied.

The application to electronic apparatuses is over a wide range fromdomestic stationary terminals to battery-driven mobile terminals such asDVD systems, video camera systems and picture-phone systems for mobiletelephones currently performed according to the MPEG-2.

In these systems, the performance required of the LSI realizing theMPEG-4 AVC standard differs according to the manner of systemapplication. For stationary systems, since large image sizes arehandled, processing performance is important, whereas for mobileterminals, reduction in power consumption is important to increase thebattery life. The signal-processing apparatus of the present inventionand an electronic apparatus using the same are applicable to both ofthem. That is, by combining the instruction-parallel processor, thedata-parallel processor and the dedicated hardware, the improvement inprocessing performance and the reduction in power consumption areenabled.

The signal-processing apparatus of the present invention comprises aplurality of SIMD processors (in the example of FIG. 1, the firstdata-parallel processor 101 and the second data-parallel processor 102).One SIMD processor includes eight processing elements, and eight datastreams can be processed in parallel at one instruction. By changing thenumber of provided SIMD processors according to the purpose for usingthe signal-processing apparatus, various performance requirements can bemet without the LSI architecture being changed.

For example, in the signal-processing apparatus for mobile terminalsrequiring low power consumption, by providing two SIMD processors, thedegree of parallelism can be made 16, so that a low-voltage operationand the reduction in operating frequency are enabled.

Moreover, instead of using with the degree of parallelism being 16, itcan be performed using two pairs of SIMD processors each comprisingeight processing elements and cause them to perform differentprocessing.

By dividing the entire processor and performing parallel processing suchthat the second SIMD processor performs DCT processing while the firstSIMD processor is performing the pixel value calculation of the motioncompensation, a plurality of processing can be performed while theoperating ratios are maintained. Consequently, the calculationperformance can be significantly improved.

While applications conforming to the MPEG-4 AVC standard are describedin the above-described embodiments, the present invention is not limitedto these applications. The gist of the present invention is to realizethe improvement in processing performance and the reduction in powerconsumption by combining the instruction-parallel processor, thedata-parallel processor and the dedicated hardware, and variousapplications are possible without departing from the gist of theinvention.

According to the present invention, a signal-processing apparatuscapable of performing high-performance and high-efficiency imageprocessing for image processing requiring a large data processing amountlike the encoding/decoding processing of the MPEG-4 AVC, and anelectronic apparatus using the same can be provided.

Having described preferred embodiments of the invention with referenceto the accompanying drawings, it is to be understood that the inventionis not limited to those precise embodiments, and that various changesand modifications may be effected therein by one skilled in the artwithout departing from the scope or spirit of the invention as definedin the appended claims.

1. A signal-processing apparatus comprising: an instruction-parallelprocessor; a data-parallel processor; and a plurality of pieces ofdedicated hardware, wherein said instruction-parallel processor performsaudio compression/decompression and non-routine or less-heavy operationof image compression/decompression, wherein said data-parallel processorperforms, of the image compression/decompression, routine or heavyoperation, and wherein said plurality of pieces of dedicated hardwareperform, of the image compression/decompression, comparatively heavyprocessing.
 2. A signal-processing apparatus according to claim 1,further comprising: a first instruction bus; a first data bus; a firstshared memory; and an input and output interface, wherein each of saidinstruction-parallel processor, said data-parallel processor, saidplurality of pieces of dedicated hardware and said input and outputinterface comprises a local memory, said instruction-parallel processor,said data-parallel processor and said plurality of pieces of dedicatedhardware are connected to said first instruction bus, wherein aninstruction for said instruction-parallel processor to control saiddata-parallel processor and said plurality of hardware is communicatedthrough said first instruction bus, and wherein said local memory ofsaid instruction-parallel data processor, said local memory of saiddata-parallel processor, said local memories of said plurality of piecesof dedicated hardware, said first shared memory and said local memory ofsaid input and output interface are connected to said first data bus,and data transfer is performed among these memories.
 3. Asignal-processing apparatus according to claim 2, further comprising: asecond data bus; a second shared memory; and a bridge unit connectingsaid first data bus and said second data bus, wherein said local memoryof said data-parallel processor, said local memories of said pluralityof pieces of dedicated hardware, said first shared memory and said localmemory of said input and output interface are connected to said firstdata bus, and data transfer is performed among these memories, whereinsaid local memory of said instruction-parallel processor and said secondshared memory are connected to said second data bus, and data transferis performed between these memories, and wherein data transfer betweensaid memories connected to said first data bus and said memoriesconnected to said second data bus is performed through said bridge unit.4. A signal-processing apparatus according to claim 3, furthercomprising a control processor, wherein said instruction-parallelprocessor controls said data-parallel processor and said plurality ofpieces of dedicated hardware through said control processor.
 5. Asignal-processing apparatus according to claim 4, further comprising asecond instruction bus, wherein said instruction-parallel processor,said control processor and a part of said plurality of pieces ofdedicated hardware are connected to said first instruction bus, whereinsaid control processor, said data-parallel processor and the remainderof said plurality of pieces of dedicated hardware, the remainder beingnot connected to said first instruction bus, are connected to saidsecond instruction bus, and wherein said instruction-parallel processorcontrols the part of said plurality of pieces of dedicated hardware, andcontrols, through said control processor, said data-parallel processorand the remainder of said plurality of pieces of hardware.
 6. Asignal-processing apparatus according to claim 1, wherein saiddata-parallel processor comprises a plurality of processing units, andwherein a number of said plurality of processing units of saiddata-parallel processor is determined according to a compressed ordecompressed image size.
 7. A signal-processing apparatus according toclaim 1, wherein said data-parallel processor comprises a plurality ofprocessing units, and wherein a number of said plurality of processingunits of said data-parallel processor is determined according to atleast one of a power supply voltage and an operating frequency.
 8. Asignal-processing apparatus according to claim 1, wherein processingperformed by said plurality of pieces of dedicated hardware includes atleast one of variable-length coding processing, variable-length decodingprocessing, video input and output processing, motion detectionprocessing, motion compensation processing, DCT (discrete cosinetransform) processing, inverse DCT processing, quantization processing,inverse quantization processing and de-blocking filtering processing. 9.A signal-processing apparatus according to claim 5, wherein processingperformed by the part, of said plurality of pieces of dedicatedhardware, the part being connected to said first instruction bus, isvariable-length coding processing and/or variable-length decodingprocessing.
 10. An electronic apparatus comprising a signal-processingapparatus, said signal-processing apparatus comprising: aninstruction-parallel processor; a data-parallel processor; and aplurality of pieces of dedicated hardware, wherein saidinstruction-parallel processor performs audio compression/decompressionand non-routine or less-heavy operation of imagecompression/decompression, wherein said data-parallel processorperforms, of the image compression/decompression, routine or heavyoperation, wherein said plurality of pieces of dedicated hardwareperform, of the image compression/decompression, comparatively heavyprocessing, and wherein said signal-processing apparatus performs atleast one of audio compression processing, audio decompressionprocessing, image compression processing and image decompressionprocessing.
 11. An electronic apparatus according to claim 10, furthercomprising: a reproducer; a demodulator/error corrector; a memory; and aplurality of D/A converters, wherein said reproducer reproducesmodulated coded signals from a recording medium loaded therein, whereinsaid demodulator/error corrector demodulates the modulated coded signalsreproduced by said reproducer, error-corrects the demodulated signals,and outputs the error-corrected signals as coded data, wherein saidsignal-processing apparatus decodes the coded data outputted by saiddemodulator/error corrector, and outputs the decoded data as video dataand audio data, wherein said memory stores data before decoding, duringdecoding and after decoding, and wherein said plurality of D/Aconverters D/A-convert the video data and the audio data outputted bysaid signal-processing apparatus, and outputs an analog video output andan analog audio output.
 12. An electronic apparatus according to claim10, further comprising: a plurality of A/D converters; a memory; anerror corrector/modulator; and a recorder, wherein said plurality of A/Dconverters A/D convert an inputted analog video input and analog audioinput, and outputs video data and audio data, wherein saidsignal-processing apparatus encodes the video data and the audio dataoutputted by said plurality of A/D converters, and outputs coded data,wherein said memory stores data before encoding, during encoding andafter encoding, wherein said error corrector/modulator adds an errorcorrecting code to the coded data encoded by said signal-processingapparatus, modulates the coded data, and outputs the modulated data ascoded signals, and wherein said recorder records the coded signalsoutputted by said error corrector/modulator onto a recording mediumloaded therein.
 13. An electronic apparatus comprising the electronicapparatus according to claim 11 and the electronic apparatus accordingto claim
 12. 14. An electronic apparatus according to claim 11, whereinthe recording medium is an optical disk.
 15. An electronic apparatusaccording to claim 11, wherein the recording medium is a magnetic disk.16. An electronic apparatus according to claim 11, wherein the recordingmedium is a semiconductor memory.